Where to Play: Retrieval of Video Segments using Natural-Language Queries

نویسندگان

  • Sangkuk Lee
  • Daesik Kim
  • Myunggi Lee
  • Jihye Hwang
  • Nojun Kwak
چکیده

In this paper, we propose a new approach for retrieval of video segments using natural language queries. Unlike most previous approaches such as concept-based methods or rule-based structured models, the proposed method uses image captioning model to construct sentential queries for visual information. In detail, our approach exploits multiple captions generated by visual features in each image with ‘Densecap’. Œen, the similarities between captions of adjacent images are calculated, which is used to track semantically similar captions over multiple frames. Besides introducing this novel idea of ’tracking by captioning’, the proposed method is one of the €rst approaches that uses a language generation model learned by neural networks to construct semantic query describing the relations and properties of visual information. To evaluate the e‚ectiveness of our approach, we have created a new evaluation dataset, which contains about 348 segments of scenes in 20 movie-trailers. Œrough quantitative and qualitative evaluation, we show that our method is e‚ective for retrieval of video segments using natural language queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing an intelligent video database using evolutionary control

In this paper we present the implementation of an intelligent video database using evolutionary control. By using automatic video indexing techniques, the retrieval of video segments can be performed using free natural language queries. Retrieval of video segments from a database for editing and viewing is becoming an important topic in video processing. A cinematic movie consists of video segm...

متن کامل

Speech Recognition and Information Retrieval:

The Informedia Digital Video Library Project at Carnegie Mellon University is creating large digital libraries of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. These digital video libraries allow users to explore multi-media data in depth as well as in breadth. The Informed...

متن کامل

Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

In this paper, we attempt to tackle the MediaEval 2012 Search and Hyperlinking challenge, which focuses on video segment retrieval from a large dataset, based on short natural language queries, as well as linking the resulting segments to related ones. Our approach makes use of three semantic similarity metrics, merged by applying late fusion.

متن کامل

Large-Scale Query-by-Image Video Retrieval Using Bloom Filters

We consider the problem of using image queries to retrieve videos from a database. Our focus is on large-scale applications, where it is infeasible to index each database video frame independently. Our main contribution is a framework based on Bloom filters, which can be used to index long video segments, enabling efficient image-to-video comparisons. Using this framework, we investigate severa...

متن کامل

UTwente does Brave New Tasks for MediaEval 2012: Searching and Hyperlinking

In this paper we report our experiments and results for the brave new searching and hyperlinking tasks for the MediaEval Benchmark Initiative 2012. The searching task involves finding target video segments based on a short natural language sentence query and the hyperlinking task involves finding links from the target video segments to other related video segments in the collection using a set ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1707.00251  شماره 

صفحات  -

تاریخ انتشار 2017